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Assessing  Dimensionality  of  a  Set  of  Items — Comparison  of  Different  Approaches 


Abstract 

This  study  examines  the  performance  of  the  following  four  methodologies  for 
assessing  uni  dimensionality:  DIMTEST,  Holland  and  Rosenbaum's  approach,  linear  factor 
analysis,  and  nonlinear  factor  analysis.  Each  method  is  examined  and  compared  with  other 
methods  on  simulated  data  sets  and  on  real  data  sets.  Seven  data  sets,  all  with  2000 
examinees,  were  generated:  three  unidimensional,  and  four  two-dimensional  data  sets.  Two 
levels  of  correlatioh  between  abilities  were  considered:  p=.3  and  p=.7.  Eight  different  real 
data  sets  were  used:  four  of  them  were  expected  to  be  unidimensional,  and  the  other  four 
were  expected  to  be  two-dimensional.  Findings  suggest  that,  while  the  linear  factor 
analysis  often  overestimated  the  number  of  underlying  dimensions,  the  other  three  methods 
correctly  confirmed  unidimensionality  but  differed  in  their  ability  to  detect  lack  of 
unidimensionality.  DIMTEST  showed  excellent  power  in  detecting  lack  of 
unidimensionality;  Holland  and  Rosenbaum's  and  nonlinear  factor  analysis  approaches 
showed  good  power,  provided  the  correlation  between  abilities  was  low. 


Subject  terms:  DIMTEST,  unidimensionality,  essential  dimensionality,  non-linear  factor 
analysis,  item  response  theory. 
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Assessing  Dimensionality— Comparison 


It  is  well  known  that  most  item  response  theory  (IRT)  models  require  the 
assumption  of  unidimensionality.  According  to  Lord  and  Novick  (1968),  dimensionality  is 
defined  as  the  total  number  of  abilities  required  to  satisfy  the  assumption  of  local 
independence.  If  there  is  only  one  ability  affecting  the  responses  of  a  set  of  items  to  meet 
the  assumption  of  local  independence,  then  that  set  is  referred  to  as  a  unidimensional  set. 

It  has  also  been  long  argued  that  responses  to  test  items  are  multiply  determined 
(Humphreys,  1981, 1985, 1986;  Hambleton  &  Swaminathan,  1985,  chap.  2;  Reckase,  1979, 
1985;  Stout,  1987;*Traub,  1983;  Yen,  1985),  and  several  abilities  unique  to  items  or 
common  to  relatively  few  items  are  inevitable.  The  ability  which  the  test  is  intended  to 
measure  (i.e.,  the  ability  common  to  all  items)  will  be  referred  to  as  the  dominant  ability, 
and  abilities  unique  to  or  influencing  responses  to  few  items  will  be  referred  to  as  minor 
abilities.  Given  that  item  responses  are  multiply  determined,  it  is  intuitively  clear  that,  in 
order  to  satisfy  the  assumption  of  unidimensionality,  it  is  required  that  a  given  test 
measure  a  single  dominant  ability.  A  number  of  simulation  studies  have  demonstrated  that 
a  dominant  ability  can  be  recovered  well,  using  computer  programs  such  as  LOGIST,  in 
the  presence  of  several  minor  factors  (Reckase,  1979;  Drasgow  &  Parsons,  1983;  Harrison, 
1986).  Although  counting  only  dominant  dimensions  violates  Lord  and  Novick 's  (1968) 
definition  of  dimensionality,  it  is  commonly  accepted  that,  in  order  to  apply 
unidimensional  item  response  theory  models,  it  is  sufficient  to  show  that  there  is  one 
dominant  ability  underlying  the  responses  to  a  set  of  items^. 

Stout  (1987,  1990)  provided  a  mathematically  rigorous  definition  of  dominant 
dimensionality  referred  to  as  essential  dimensionality  and  provided  a  statistical  test 
(DIMTEST)  to  assess  whether  a  set  of  items  met  the  requirement  for  essential 
unidimensionality.  Junker  (1988,  1991)  further  explored  essential  dimensionality  for 
dichotomous  and  polytomous  items  and  established  consistency  results  for  the  maximum 
likelihood  ability  estimates  of  d  under  essential  unidimensionality.  Essential  dimensionality 
is  the  total  number  of  abilities  required  to  satisfy  the  assumption  of  essential  independence. 
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An  item  pool  is  said  to  be  essentially  independent  (El)  with  respect  to  the  latent  variable 
vector  ^  if,  for  a  given  subset  of  items,  the  average  absolute  conditional  (on  covariances 
of  responses  to  item  pairs  approaches  zero  as  the  length  of  the  subset  increases.  When 
conditional  covariances  based  on  only  one  dominant  ability  meet  the  assumption  of 
essential  independence,  the  response  data  is  said  to  be  essentially  unidimensional  (d^l). 

In  contrast,  the  assumption  of  local  independence  requires  that  the  conditional  covariances 
be  zero  for  responses  to  any  item  pair,  and  the  number  of  abilities  required  to  those 
conditional  covari  Wes  is  the  dimensionality.  According  to  this  definition  of 
dimensionality,  all  major  and  minor  abilities  influencing  item  responses  have  to  be 
considered  when  assessing  the  local  independence  assumption;  whereas,  according  to  the 
essential  dimensionality,  it  is  sufficient  to  consider  only  the  influence  of  dominant  abilities. 
Hence,  essential  independence  and  essential  dimensionality  are  weaker  forms  of  local 
independence  and  traditional  dimensionality  respectively. 

Stout's  definition  of  essential  dimensionality  is  conceptually  based  on  an  infinite 
item  pool.  An  infinite  item  pool  can  be  conceptualized  in  two  ways:  1.  as  a  consequence  of 
continuing  the  test  construction  process  beyond  the  i\r  items  of  the  test  being  studied  where 
the  iV  items  become  a  subset  of  the  item  pool;  2.  as  a  consequence  of  a  sequence  of  finite 
tests  where  each  finite  test  is  optimally  constructed.  For  example,  a  20-item  test  is 
constructed  with  the  knowledge  that  the  test  is  going  to  be  only  20  items  long  and  that  it  is 
not  necessarily  a  subset  of  an  optimal  40-item  test.  In  this  way,  an  item  pool  is  a  collection 
of  optimal  finite  test  length  tests  (for  details  see  Junker,  1991;  Junker  &  Stout,  1991). 

In  assessing  essential  unidimensionality  of  given  item  responses,  DIMTEST  assesses 
the  likelihood  that  the  given  set  of  item  responses  come  from  an  essentially  unidimensional 
item  pool.  That  is,  DIMTEST  assesses  whether  or  not  the  model  generating  the  given  item 
responses  is  close  to  the  El,  ^E~  1  model.  The  major  focus  in  assessing  essential 
unidimensionality  of  a  given  set  of  item  responses  is  to  determine  how  ’’minor''  the 
influence  of  minor  abilities  is  and  whether  the  influence  of  these  minor  abilities  can  be 
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ignored  when  assessing  essential  unidimensionality. 

Historically  speaking,  linear  factor  analysis  has  been  used  to  assess  the 
dimensionality  of  the  latent  space  underlying  the  responses  to  a  set  of  items.  If  the  results 
indicate  a  one-factor  solution,  then  it  can  be  inferred  that  one  dominant  ability  is 
influencing  item  responses.  There  are,  however,  a  number  of  technical  as  well  as 
methodological  problems  associated  with  using  linear  factor  analyses  to  assess 
dimensionality.  For  example,  difficulty  levels  of  items  and  guessing  levels  of 
multiple— choice  items  can  each  play  a  major  role  in  affecting  the  factor  structure  of  item 
responses  (for  details  see  Carroll,  1945;  Hulin,  Drasgow,  &  Parsons,  1983,  chap.  8;  Zwick, 
1987).  Consequently,  many  attempts  ha.e  been  made  by  researchers  in  recent  years  to 
develop  new  methods  to  assess  dimensionality.  Some  of  the  recently  developed  methods 
include  nonlinear  factor  analysis  (McDonald  &  Ahlawat,  1974);  Bejar's  procedure  (Bejar, 
1980);  order  analysis  (Wise,  1981);  modified  parallel  analysis  (Hulin,  Drasgow,  &  Parsons, 
1983,  p.  255);  residual  analysis  (Hambleton  &  Swaminathan,  1985,  p.  163);  Bock's  full 
information  factor  analysis  (Bock,  Gibbons,  &  Muraki,  1985);  Holland  and  Rosenbaum's 
test  of  unidimensionality,  monotonicity,  and  conditional  independence  (Rosenbaum,  1984; 
Holland  &  Rosenbaum,  1986);  Roznowski,  Tucker,  and  Humphreys’  procedures  (1991);  and 
Stout's  unidimensionality  procedure  DIMTEST  (Stout,  1987). 

Hattie  (1985),  Hambleton  and  Rovinelli  (1986),  and  Berger  and  Knol  (1990)  have 
reviewed  several  procedures  for  assessing  dimensionality,  including  some  of  the  above 
mentioned  procedures.  The  main  focus  of  this  paper  is  to  study  and  compare  some  of  the 
procedures  to  assess  dimensionality  that  are  most  recent,  seem  promising,  and  are  little 
studied.  Four  procedures  are  considered  and  compared  in  this  paper:  DIMTEST,  Holland 
and  Rosenbaum's  procedure,  nonlinear  factor  analysis,  and  linear  factor  analysis.  Linear 
factor  analysis  was  used,  because  of  its  historical  importance,  as  a  benchmark  to  compare 
other  procedures.  Several  sets  of  unidimensional  and  midtidimensional  test  data  were 
simulated  and  used  to  study  the  performance  of  all  four  procedures  for  assessing 
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dimensionality.  The  same  procedures  were  then  repeated  with  real  test  data. 

Description  of  Procedures 
Linear  Factor  Analysis 

Linear  factor  analysis  is  the  most  commonly  used  approach  to  assess  dimensionality. 
With  linear  factor  analysis,  each  extracted  factor  is  presumed  to  represent  a  dimension, 
and  items  that  loa*d  heavily  on  a  given  factor  are  considered  good  measures  of  that 
dimension.  There  are  a  number  of  fundamental  problems  associated  with  applying  linear 
factor  analysis  to  binary  data.  First,  linear  factor  analysis  assumes  that  the  relationship 
between  the  observed  variables  and  the  underlying  factors  is  linear  and  that  the  variables 
are  continuous  in  nature.  But  it  is  clear  for  dichotomous  data  that  the  relationship  between 
the  performance  and  the  underl3ring  latent  variable  is  not  linear.  Hence,  applying  factor 
analysis  to  phi  or  tetrachoric  correlations  of  binary  item  responses  produces  difficulty 
factors  (Hulin,  Drasgow,  &;  Parsons,  1983,  chap.  8).  Second,  in  computing  tetrachoric 
correlations,  the  cell  entries  of  the  fourfold  table  for  a  pair  of  dichotomous  items  sometimes 
equal  zero,  making  it  difficult  to  determine  an  appropriate  value  for  the  correlation.  Third, 
determination  of  the  number  of  significant  factors  could  be  problematic. 

In  this  study  the  statistical  package  LISCOMP  was  used  to  perform  exploratory 
linear  factor  analysis  using  tetrachoric  correlations.  Three  different  approaches  were  used 
to  determine  the  number  of  significant  factors:  parallel  analysis,  the  chi— square  test  of 
goodness  of  fit,  and  goodness  of  fit  statistics  (the  means  and  standard  deviations  of  the 
squares  of  residual  correlations  and  absolute  residuals). 

According  to  parallel  analysis  (Humphreys  &  Montanelli,  1975),  the  eigenvalues  of 
the  given  correlation  matrix  are  compared  with  the  eigenvalues  of  random  data.  The 
random  data  consist  of  binary  responses  generated  with  the  same  number  of  items  and 
examinees  as  that  of  the  given  data.  The  largest  eigenvalue  from  the  random  data  is  used 
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as  the  cutoff  point  for  eigenvalues  from  the  actual  data  to  determine  the  number  of 
significant  factors.  That  is,  the  number  of  eigenvalues  of  the  actual  data  greater  than  the 
largest  eigenvalue  of  the  random  data  is  taken  as  the  significant  number  of  factors 
underlying  the  given  data. 

The  second  method  used  to  determine  the  number  of  factors  was  the  chi-square  test 
of  goodness  of  fit  from  LISCOMP.  The  third  method  involves  comparisons  of  means  and 
standard  deviations  of  squares  of  r  ^iduals  and  absolute  values  of  residuals  after  fit  of  an 
m— factor  model  vnth  the  corresponding  values  from  the  random  data.  If  the  residuals  are 
sufficiently  "small,"  then  one  can  regard  the  fit  of  the  model  as  "reasonably  satisfactory" 
(McDonald,  1981;  Hattie,  1985,  Hambleton  &  Rovinelli,  1986;  and  Berger  &  Knol,  1990). 

Nonlinear  Factor  Analysis 

McDonald  (1967, 1980,  1982)  and  McDonald  and  Ahlawat  (1974)  have 
demonstrated  that  applying  linear  factor  analysis  to  unidimensional  binary  data  yields 
"nonlinear  factors"  rather  than  "difficulty  factors."  Nonlinear  factors  account  for  nonlinear 
relationships  among  the  variables  by  using  higher  order  polynomials  in  the  factor  model 
(for  example,  quadratic  and  cubic  terms).  McDonald  developed  the  method  of  nonlinear 
factor  analysis  (NLFA)  to  account  for  the  nonlinearity  of  the  data  as  an  improvement  over 
linear  factor  analysis.  The  variables  in  the  model  can  be  expressed  as  polynomial  functions 
of  latent  traits  or  factors.  For  example,  a  two— factor  model  with  linear  and  quadratic 
terms  would  be  of  the  following  form: 

where  Y-^  denotes  the  examinee's  score  on  item  i,  9^  and  denote  latent  traits, 
denotes  the  factor  loading  of  the  t-th  item  on  the  y-th  common  factor  for  the  k-th.  degree 
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element  in  the  polynomial;  denotes  the  unique  factor  and  denotes  the  unique  factor 
loading  for  item  i.  Hambleton  and  Rovinelli  (1986)  have  demonstrated  the  use  of  NLFA  to 
assess  dimensionality  and  found  it  to  be  a  promising  method.  They,  however,  caution  about 
the  criterion  for  the  adequacy  of  the  fit  of  the  model. 

In  the  present  study,  NLFA  embodied  in  the  computer  program  NOFA,  developed 
by  Etazadi— Amoli  and  McDonald  (1983),  was  used.  The  fit  of  the  model  is  studied  just  as 
in  the  case  of  the  linear  factor  analyses,  by  comparing  the  means  and  standard  deviations 
of  squared  residuafs  and  absolute  residuals  with  the  corresponding  values  of  random  data 
and  linear  factor  analyses.  The  chi-square  statistic  values  are  not  available  &om  NOFA. 

Holland  and  Rosenbaum's  Test  of  Lack  of  Fit  of  a 
Unidimensional,  Monotone,  and  Conditional  Independent  Model 

Rosenbaum  (1984)  and  Holland  and  Rosenbaum  (1986)  have  proved  theorems 
concerning  conditional  association  tha+  <*an  be  applied  to  assess  dimensionality.  The  basic 
notion  in  Holland  and  Rosenbaum's  (H&R)  theorems  is  that  if  the  items  are  locally 
independent,  uni  dimensional,  and  the  item  characteristic  curves  are  monotone,  then  the 
items  are  conditionally  positively  associated.  Specifically,  the  conditional  covariances 
between  any  pair  of  item  response  functions  of  a  set  of  unidimensional  dichotomous  item 
responses  given  any  function  of  the  remaining  item  responses  will  be  noimegative.  The  test 
of  this  relationship  can  be  specified  as 

H.:  Cov  (X.,  X.\  I  XJ>  0  vs.  H,:  Cov  (X-,  X,\  H  XJ  <  0 
^  ^  ijfk  ^  ^  ij^k 

Conditional  associations  for  each  pair  of  items  is  tested,  given  the  number— right 
score  on  the  remaining  items.  The  Mantel-Haenszel  test  (M— H)  (Mantel  &  Haenszel,  1959) 
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is  used  to  test  this  hypothesis.  To  perform  the  M— H  test  on  a  given  pair  of  items,  a  2x2 
contingency  table  is  constructed  for  the  pair  for  each  of  the  possible  number-^ght  scores 
on  the  remaining  items.  The  cell  values  of  a  2><2  table  for  item  pair  i  and  j  for  examinees 
with  total  score  k  {k=l,2,...K)  on  the  remaining  items  can  be  denoted  as  the  following:  the 
number  of  examinees  who  got  both  item  i  and  item  j  correct  (wj  jp.  the  number  of 
examinees  who  got  both  item  t  and  item  j  incorrect  (^QQp,  the  number  of  examinees  who 
got  item  *  correct  and  item  j  incorrect  (^Qp,  and  the  number  of  examinees  who  got  item  i 
incorrect  and  item*;  correct  (^Q^p-  The  M— H  statistic  is  then  given  by 


Z  = 


(1) 


K 

where  =  S  and  and  . )  are  the  expectation  and  the  variance  of 

^1+  given  by 


(2) 


Jfc=l  ”++ik 


and 


vfr,  \_\^l  +  k^0  +  k^+lk^+0k 


(3) 


The  plus  subscript  in  Equations  2  and  3  denotes  the  summation  over  that  subscript.  The 
computed  lvalue  is  compared  to  the  lower  tail  of  the  standard  normal  distribution.  A 
statistically  significant  Z  implies  that  the  pair  of  items  in  question  are  not  conditionally 
associated,  given  the  sum  of  the  remaining  items  and  are  thus  inconsistent  with  the 
unidimensional  model.  In  this  manner,  the  M-H  statistic  is  computed  for  all  N{N-l)/2 
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pairs  of  items,  where  Nis  the  total  number  of  items  in  a  test.  If  a  "large"  number  of  pairs 
are  shown  not  to  be  conditionally  associated,  then  the  unidimensional  assumption  is 
inappropriate. 

Since  H&R  approach  tests  each  item  pair  with  significance  level  a,  the  simultaneous 
inference  for  all  item  pairs  can  be  based  on  Bonferroni  bounds  (Holland  &  Rosenbaum, 
1986,  Junker,  1990,  and  Zwick,  1987).  According  to  Bonferroni  bounds,  one  would  accept 
H.  if  the  number  of  rejections  at  level  a  is  around  ta,  where  t  is  the  number  of  tests 
performed,  which  Is  equal  to  N{N—l)/2]  one  would  reject  H.  if  at  least  one  test  is  rejected 
at  level  af  L 

Rosenbaum  (1984),  Zwick  (1987),  and  Ben— Simon  and  Cohen  (1990)  have 
demonstrated  the  application  of  H&R  approach  to  assess  dimensionality.  Ben— Simon  and 
Cohen  found  the  H&R  approach  to  be  conservative  and  erroneously  misclassified  nearly 
half  of  the  multidimensional  item  pools  they  analyzed  as  unidimensional.  Zwick  found 
H&R  approach  to  be  consistent  with  other  procedures  investigated  in  assessing 
unidimensionality  of  NAEP  reading  data. 

DIMTEST 

Stout  (1987)  developed  DIMTEST  to  test  the  hypothesis  of  essential 
unidimensionality:  the  existence  of  one  dominant  dimension.  Nandakumar  and  Stout  (in 
press)  further  modified  and  improved  the  performance  of  DIMTEST.  The  improvements 
have  lead  to  the  following:  a  robust  procedure  against  presence  of  guessing  in  item 
responses;  a  better  control  of  the  observed  level  of  significance,  and  greater  power;  and 
automation  of  the  size  of  assessment  subtests,  as  described  below.  The  hypothesis  to  test 
unidimensionality  can  be  stated  as 
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^0' 

where  denotes  the  essential  dimensionality  of  the  item  pool  of  which  the  given  test 
items  are  a  part. 

In  order  to  apply  DIMTEST,  it  is  assumed  that  a  group  of  J  examinees  take  an 
N—item  test.  Each  examinee  produces  a  vector  of  responses  of  Is  and  Os  with  1  denoting  a 
correct  response  and  0  denoting  an  incorrect  response.  It  is  also  assumed  that  essential 
independence  with*  respect  to  some  dominant  ability  0  holds  and  that  the  item  response 
functions  are  monotone  with  respect  to  the  same  dominant  ability  0.  DIMTEST  has 
several  steps.  These  are  briefly  described  here  (for  details  see  Stout,  1987;  Nandakumar  and 
Stout,  in  press). 

Step  1:  The  A"  items  of  the  test  are  split  into  three  subtests:  ATI,  AT2,  and  PT. 
First,  ATI  items  are  selected  so  that  these  items  all  measure  the  same  dominant  ability. 
This  can  be  achieved  either  through  factor  analysis  (FA)  or  through  expert  opinion  (EO). 

If  FA  method  is  chosen,  M  items  with  highest  loadings  on  the  second  factor  (before 
rotation)  are  selected.  !.u  this  case,  the  program  automatically  determines  the  size  Af  of 
ATI  as  a  function  of  the  test  length  and  the  sample  size.  If  EO  is  sought,  on  the  other 
hand,  it  is  recommended  that,  at  most,  one-quarter  of  the  total  items  should  be  selected 
that  tap  the  same  ability.  After  selecting  items  of  ATI,  items  of  AT2  are  selected,  also  of 
the  same  size  Af,  so  that  items  of  ATI  and  AT2  have  the  same  difficulty  distribution  (for 
details  see  Stout,  1987).  The  remaining  items  {n=N—2M)  form  the  partition  subtest  PT.  In 
the  present  study,  FA  is  chosen  to  select  ATI  items.  For  examples  where  EO  is  used  to 
select  ATI  items,  see  Nandakumar  (in  press). 

When  FA  is  used  to  select  ATI  items,  the  given  sample  of  7  examinee  responses  are 
partitioned  into  two  groups.  One  group  of  examinee  responses  (500  examinees 
recommended)  is  used  for  exploratory  factor  analysis  to  select  ATI  a^^d  AT2  items,  and  the 
other  group  of  examinee  responses  is  used  to  compute  the  Stout's  statistic  T. 
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Step  2:  The  second  group  of  examinees  (if  the  first  group  of  examinees  is  used  for 
FA)  are  partitioned  into  K  subgroups  based  on  their  PT  score.  That  is,  all  examinees 
obtaining  the  same  total  score  on  PT  are  assigned  to  the  same  subgroup  k  (Jk=l,2,...ii0. 

Step  3:  Within  each  subgroup  examinee  responses  to  subtest  items  ATI  and  AT2 
are  used  to  compute  the  unidimensional  statistic  T  given  by 
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And  the  standard  error  of  estimate  for  subgroup  k  is  given  by 


where 


and 


The  computed  T— value  is  referred  to  the  upper  tail  of  the  standard  normal 
distribution  to  obtain  the  significance  level.  The  significant  values  associated  with 
unidimensional  tests  are  expected  to  be  large  while  the  significant  values  associated  with 
multidimensional  tests  are  expected  to  be  within  the  margin  of  the  specified  level  of 
significance. 

DIMTEST  assesses  the  degree  of  closeness  of  an  essentially  unidimensional  model  to 

the  model  generating  the  observed  data.  This  is  done  by  splitting  the  test  items  into  three 

subtests — ^ATl,  AT2,  and  PT — as  described  above.  When  the  model  underlying  the  test 

item  responses  is  close  to  essentially  unidimensional,  items  of  ATI,  AT2,  and  PT  would  all 

be  of  the  same  dominant  dimension;  therefore,  the  value  of  the  statistic  T  computed  based 

on  ATI,  AT2  would  be  "small,”  leading  to  the  tenability  of  H  .  When  the  model 

0 

underlying  the  test  responses  is  not  essentially  unidimensional,  however,  items  of  ATI 
would  be  dimensionally  different  from  items  of  AT2  and  PT  and  the  value  of  the  statistic 
T  will  be  "large"  leading  to  the  rejection  of  H^. 

DIMTEST  has  been  found  to  discriminate  between  unidimensional  and 
two-dimensional  tests  for  a  variety  of  simulated  test  data  when  the  correlation  between 
abilities  is  as  high  as  .7  (Stout,  1987;  Nandakumar  &  Stout,  in  press).  Nandakumar  (1991) 
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has  shown  the  usefulness  of  DIMTEST  to  assess  essential  unidimensionality  in  the  possible 
presence  of  seyeral  minor  abilities.  The  findings  indicate  that  essential  unidimensionality  is 
established  when  each  of  the  minor  abilities  influence  relatively  few  items,  or,  if  minor 
abilities  are  influencing  many  items,  the  strength  of  the  influence  of  the  minor  abilities  is 
low.  As  the  strength  of  the  minor  abilities  increases,  the  approximation  to  an  essentially 
unidimensional  model  degenerates,  inflating  the  type— I  error  of  the  test  of  hypothesis  of 
essential  unidimensionality.  Nandakumar  (in  press)  has  further  rqplicated  these  findings  on 
a  wide  variety  of  real  test  data.  This  study  also  demonstrates  the  sensitivity  of  DIMTEST 
to  major  and  minor  abilities  influencing  item  responses. 

Description  of  Test  Data 
The  Simulated  Test  Data 

Seven  data  sets,  DATAl— DATA7,  were  generated.  Of  the  seven,  three  data  sets, 
DATAl— DATA3,  are  strictly  unidimensional,  consisting  of  25,  40,  and  50  items, 
respectively.  The  other  four  data  sets,  DATA4-DATA7,  are  two-dimensional  with  length 
N=2h  and  correlation  between  abilities  p=.3,  iV=s25  and  N=50  and  p=.Z,  and  i\f=50 
and  p=.7,  respectively.  All  7  data  sets  have  2000  examinees.  These  data  set  characteristics 
are  summarized  in  Table  1. 

Table  1  about  here 

The  unidimensional  data  sets  were  generated  using  the  three-parameter  logistic 
model  given  by 


13 


Assessing  Dimensionality-Comparison 


The  abilities  (0)  were  independently  generated  from  the  standard  normal  distribution,  and 
the  item  parameters  of  real  tests  as  described  in  Nandakumar  (1991)  were  used  in 

generating  item  responses.  For  example,  items  of  DATA  1  have  a  larger  variability  in 
discrimination  power  (a^),  ranging  from  1.22  to  2.82;  items  of  DATA  2  have  a  smaller 
variability  of  a^s,  tanging  from  1.07  to  2.00.  For  each  simulated  examinee,  the  probability 
of  correctly  answering  each  item,  P^(0),  was  computed  using  the  three— parameter  logistic 
model.  For  each  item  t,  a  random  number  between  0  and  1  was  generated  from  a  uniform 
distribution.  If  the  computed  probability,  P-(d),  was  greater  than  or  equal  to  the  random 
number  generated,  the  examinee  was  said  to  have  answered  the  item  correctly  and  was 
given  a  score  of  1;  otherwise  the  examinee  was  given  a  score  of  0.  The  two-dimensional  test 
data  were  generated  according  to  the  multidimensional  compensatory  model  (Reckase  & 
McKinley,  1983)  given  by 


The  abilities  0  =  (^^>0^)  were  sampled  from  a  bivariate  normal  distribution  with 
both  means  zero  and  both  variances  one.  Two  levels  of  correlation  coefficients  between  the 
abilities  were  used:  .3  and  .7.  The  guessing  level  was  taken  to  be  .20  for  all  tests.  The 
discrimination  parameters  for  each  item  were  independently  generated  as  follows: 


where  fi  and  a  are  the  mean  and  standard  deviation  of  the  distribution  of  discrimination 
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parameters  of  the  respective  unidimensional  tests  with  the  same  number  of  items.  Similarly 
and  &2,-  assumed  to  be  independent  of  each  other  for  each  item  and  were  generated 
as  follows: 

b^-  ~  N(/i,  a),  ~  N(/i,  a), 

where  fi  and  a  are  the  mean  and  standard  deviation  of  the  distribution  of  difficulty 
parameters  of  the  respective  unidimensional  test  with  the  same  number  of  items.  For 
example  to  generate  test  data  DATA4  with  N=25  and  p=.Z,  the  means  and  standard 
deviations  of  and  b^  of  item  parameters  used  for  DATAl  were  used.  The  item  responses 
(0,1)  were  generated  exactly  as  described  for  unidimensional  case  by  using  P-(d)  of  (6). 

The  Real  Test  Data 

The  real  test  data  used  in  this  study  came  from  two  different  sources.  The  National 
Assessment  of  Educational  Progress  (NAEP,  1988)  data  for  the  1986  US  History  (HIST) 
and  Literature  (LIT)  for  grade  11 /age  17  were  obtained  from  Educational  Testing  Service. 
The  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  data  for  Arithmetic  Reasoning 
(AR)  and  General  Science  (GS)  for  grade  10  were  obtained  from  Linn,  Hastings,  Hu,  and 
Ryan  (1987).  For  all  data  sets,  examinees  who  missed  one  or  more  items  were  deleted  from 
the  analyses.  Test  sizes  and  sample  sizes  for  all  real  tests  are  given  in  bottom  half  of 
Table  1.  Since  all  four  test  data  were  assessed  as  unidimensional  by  the  methods  employed 
in  this  article  (details  are  provided  in  Results  section),  they  were  combined  to  form 
two-dimensional  tests.  Four  two-dimensional  tests  were  formed  as  follows.  The  test  data 
HSTLITl  was  formed  by  combining  the  data  of  31  items  of  HIST  with  the  data  of  5  items 
of  LIT  randomly  selected  from  30  items.  Similarly  HSTLIT2  was  formed  by  combining  the 
responses  of  31  items  of  HIST  with  the  responses  of  10  items  of  LIT,  and  the  test  data  GS 


15 


Assessing  Dimensionality-Comparison 


was  formed  by  combining  responses  of  30  items  of  AR  with  the  responses  of  10  items  of  GS. 
The  two-dimensional  test  HSTGEO  contains  31  history  items  spanning  US  history  irom 
the  colonization  period  to  modem  times  (HIST)  and  in  addition  contains  5  map  items 
requiring  the  knowledge  of  geographical  location  of  different  countries  in  the  world.  This  is 
the  actual  history  test  according  to  NAEP.  But  it  was  shown  using  DIMTEST  that  the  5 
map  items  formed  a  separate  dimension  signifi^'intly  different  from  history  items 
(Nandakumar,  in  press).  Hence  the  data  on  these  5  map  items  were  removed  from  the 
history  test  to  form  HIST  with  31  items,  and  the  original  history  data  were  treated  as  a 
natural  two-dimensional  test. 

Results 

The  results  of  DIMTEST  and  the  H&R  approach  will  be  studied  together  and 
compared  because  of  the  similarity  in  the  underlying  theory  and  because  both  of  them  are 
statistical  tests.  Likewise  the  results  of  linear  and  nonlinear  factor  analysis  will  be  studied 
and  compared  together. 


The  Simulated  Test  Data 


DIMTEST  and  HfcR  Procedure 

The  results  of  DIMTEST  and  the  H&R  approach  for  simulated  data  are  presented 
at  the  top  of  Table  2.  For  all  data  sets,  the  significance  levels  associated  with  DIMTEST 
indicate  that  DIMTEST  is  able  to  correctly  confirm  unidimensionality  and  detect  lack  of 
unidimensionality  for  both  correlation  (between  abilities)  levels  p=.3  and  p=.7.  For 
example,  all  three  unidimensional  data  sets,  DATAl— DATA3,  have  small  T- values  and 
large  significant  values,  implying  the  acceptance  of  the  null  hypothesis  of  essential 
unidimensionality  (here  the  data  were  simulated  as  strictly  unidimensional). 
Two-dimensional  data,  DATA4— DATA7,  on  the  other  hand,  have  large  T- values,  strongly 
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rejecting  the  null  hypothesis  of  essential  unidimensionality. 


Table  2  about  here 


The  results  of  the  H&R  approach  indicate  that  for  unidimensional  tests,  the  number 
of  significant  negative  partial  associations  at  level  a  (a=.05)  are  far  below  the  expected 
number  (ta),  strongly  confirming  the  unidimensional  nature  of  these  data  sets.  Among  the 
two-dimensional  data  sets,  DATA4  and  DATA6  (p=.3)  were  correctly  assessed  as 
multidimensional.  For  these  data,  the  number  of  significant  negative  partial  associations  at 
level  a  were  beyond  ta  level,  and  the  number  of  significant  negative  partial  associations 
beyond  level  a/ 1  were  15  and  1,  respectively,  identifying  them  as  multidimensional.  The 
test  data  DATA5  and  DATA?  (p=.7),  on  the  other  hand,  were  assessed  as  unidimensional. 
For  DATA5  and  DATA?,  the  number  of  significant  negative  partial  associations  at  level  a 
were  within  ta  level,  and  the  number  of  significant  negative  partial  associations  beyond 
level  a/t  was  zero,  making  them  unidimensional  tests.  It  was  disappointing  to  note  that  for 
many  of  the  item  pairs  measuring  different  traits,  in  two-dimensional  tests,  the  covariance 
did  not  approach  significance.  One  reason  for  this  could  be  the  noise  in  the  conditional 
score.  More  research  is  necessary  to  draw  definite  conclusions. 

Linear  and  Nonlinear  Factor  Analysis 

The  computer  programs  used  to  do  the  analyses,  LISCOMP  and  NOFA,  are  heavily 
computationally  intensive  and  consume  enormous  CPU  time.  In  addition,  LISCOMP  can 
not  handle  more  than  about  40  variables.  For  these  reasons,  not  all  data  sets  were  included 
in  the  linear  factor  analyses,  but  all  data  sets  were  included  in  the  nonlinear  factor 
analyses.  The  results  of  linear  and  nonlinear  factor  analyses  are  presented  in  Table  3. 
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Table  3  about  here 


Based  on  parallel  analyses,  one  factor  would  be  retained  for  DATAl,  DATA2,  and 
DATA5;  two  factors  would  be  retained  for  DATA4.  Whereas,  according  to  the  significance 
levels  associated  with  a  chi-square  test  of  goodness  of  fit,  in  Table  3,  a  two-factor  model 
fits  DATAl,  a  four— factor  model  fits  DATA2  and  DATA4,  and  a  three— factor  model  fits 
DATA5.  Similar  chi-square  values  are  not  available  for  nonlinear  models. 

The  goodness  of  fit  statistics — the  means  and  standard  deviations  of  squared 
residuals  and  absolute  residuals — are  reported  for  all  data  sets  in  Table  3.  The  top  entry  in 
Table  3  refers  to  random  data  (BANDOM)  with  25  variables  and  2000  examinees.  Because 
of  the  cost  of  computations,  only  one  random  data  set  was  used  to  compare  the  goodness  of 
fit  statistics.  Comparing  goodness  of  fit  statistics  of  RANDOM  with  DATAl,  it  appears 
that  both  one-factor  quadratic  and  one-factor  cubic  models  fit  as  well  as  the  four— factor 
linear  model.  However,  since  the  differences  in  the  magnitude  of  residuals  among  models 
are  small,  one  could  argue  that  four— factor  linear  and  one-4actor  quadratic  or  cubic  models 
are  over  fit  and  that  one  should  go  with  a  more  parsimonious  model.  Observance  of  the 
significance  values  of  the  chi-square  test  of  goodness  of  fit  indicates  that  the  two— factor 
model  fits  the  data.  If  one  strictly  applies  the  criterion  of  using  random  data  residuals  as  a 
guide  to  determine  the  number  of  factors,  however,  a  one-factor  model  with  a  quadratic 
term  seems  to  be  the  right  choice.  Similar  observations  can  be  made  for  DATA2. 
Comparing  goodness  of  fit  statistics  for  linear  and  nonlinear  factor  analysis,  it  can  be  seen 
that  for  DATA4  and  DATA5,  the  two-factor  quadratic  model  fits  better  than  the 
three-factor  linear  model,  confirming  the  two-dimensional  nature  of  data.  Here  again  one 
could  argue,  based  on  the  absolute  residuals,  that  the  differences  in  the  residuals  are  small 
and  that  the  quadratic  models  or  three-factor  and  four— factor  linear  models  are  an  over  fit. 
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The  significant  values  associated  with  the  chi-square  test  indicate  overestimation  of  factors 
for  DATA4.  As  expected,  the  means  and  the  standard  deviations  of  squared  residuals  and 
absolute  residuals  are  much  larger  for  DATA4  (p=.3)  than  for  DATA5  (p=.7),  reflecting 
more  deviation  from  unidimensionality  for  DATA4.  For  DATA5,  the  goodness  of  fit 
analyses  support  a  one-factor  quadratic  model.  Likewise  the  two— factor  quadratic  model 
fits  DATA6,  and  one-factor  quadratic  model  fits  DATA7. 

In  summary,  there  are  many  criteria  that  can  be  used  to  assess  dimensionality  by 
linear  factor  analysis  approach.  The  different  criteria  may  give  rise  to  different  conclusions 
regarding  the  dimensionality  of  the  data  set  in  consideration.  In  the  present  study  it  is 
shown  that  the  siguiilcant  values  associated  with  the  chi-square  test  overestimated  the 
number  of  factors  in  most  cases.  Parallel  analyses  correctly  identified  the  dimensionality  in 
some  cases.  Nonlinear  factor  analyses  exhibited  a  better  fit  than  the  linear  factor  analyses. 
DIMTEST  and  H&R  procedures  were  excellent  in  confirming  unidimensionality. 

DIMTEST  demonstrated  greater  power  in  detecting  multidimensionality  for  correlations 
between  abilities  as  high  as  .7.  H&R  and  nonlinear  factor  analysis  methods  demonstrated 
good  power  provided  the  correlation  between  abilities  was  low  (p=.3). 

The  Real  Test  Data 


DIMTEST  and  H&R  Procedure 

The  results  of  DIMTEST  and  H&R  for  real  data  sets  are  presented  at  the  bottom  of 
Table  2.  For  data  sets  LIT,  HIST,  AR,  and  GS,  the  T— values  associated  with  DIMTEST 
indicate  that  these  data  can  be  approximated  by  an  essentially  unidimensional  model.  The 
results  of  H&R  approach  for  these  data  are  also  consistent  with  DIMTEST  results  in  that 
the  number  of  significant  negative  partial  associations,  for  each  one  of  the  tests,  is  less  than 
the  nominal  level  ta.  While  both  approaches  strongly  support  that  HIST,  AR,  and  GS  are 
essentially  unidimensional,  the  decision  is  not  clear  for  LIT  because  there  is  one  negative 
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partial  association  that  is  significant  beyond  level  a/t,  and  the  7— value  of  DIMTEST  is  in 
the  border  line  region,  indicating  presence  of  violations  to  the  unidimensionaUty 
hypothesis. 

For  two-dimensional  data  HSTLITl,  HSTI4IT2,  ARCS,  and  HSTGEO,  the 
7— values  associated  with  DIMTEST  strongly  indicate  the  multidimensional  nature  of  these 
data.  Relatively  large  7— values  associated  with  ARCS  and  HSTGEO  indicate  that  abilities 
within  these  tests  are  more  orthogonal  than  abilities  in  HSTLITl  and  HSTLIT2.  The 
results  based  on  H^R  approach,  however,  indicate  that  all  four  data  sets  are 
unidimensional.  For  each  one  of  the  two-dimensional  data  sets,  the  number  of  significant 
negative  partial  associations  is  well  below  the  nominal  level  ta,  and  none  of  the  partial 
associations  are  significant  beyond  level  a/t.  Even  with  a  liberal  a  =  .10,  the  number  of 
negative  partial  associations  did  not  rise  above  the  nominal  level  for  any  of  the  tests.  These 
results  suggest  that  the  H&R  approach  lacks  power. 

On  further  examination  of  H&R  results,  it  was  found  that  the  M— H  lvalues  for 
many  of  the  item  pairs,  where  items  were  supposed  to  be  measuring  dififerent  traits,  did  not 
reach  significance  level.  One  explanation  for  this  a)uld  be  that  for  these  item  pairs,  the 
conditional  score  (SXj^),  on  the  basis  of  which  the  examinees  are  classified  into  different 
groups,  may  be  contaminated  with  items  tapping  different  abilities.  This  could  be 
especially  true  for  HSTLIT2  and  ARGS  where  one  quarter  of  the  test  items  are  ffom  the 
second  dominant  dimension.  Because  of  the  noise  in  the  conditional  score  distribution,  the 
covariance  of  item  pairs  measuring  different  abilities  may  not  be  exhibiting  significant 
negative  covariance.  A  proper  conditional  score  may  considerably  increase  the  power  of  the 
H&R  approach. 

Linear  and  Nonlinear  Factor  Analysis 

The  results  of  linear  and  nonlinear  factor  analysis  for  a  selection  of  real  data  sets  are 
reported  in  Table  4.  The  results  are  consistent  vrith  the  simulated  test  data  in  that  for  all 
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cases  nonlinear  factor  models  fit  better  than  linear  factor  models.  According  to  the 
chi— square  test  of  goodness  of  fit,  the  four— factor  model  was  best  fitting  for  all  data  sets 
where  linear  factor  analysis  was  performed.  Based  on  goodness  of  fit  statistics,  a  one-4actor 
quadratic  model  fits  LIT,  AR,  and  HSTLITl  better  than  three-  or  four-factor  linear 
models.  Since  a  one-factor  quadratic  model  fits  as  well  as  a  two— factor  quadratic  model,  a 
more  parsimonious  model  is  strongly  recommended  in  these  cases.  For  HSTLIT2  and 
ARCS,  again  it  appears  that  a  one-factor  quadratic  model  is  appropriate.  If  chi-square 
statistics  were  available  along  with  the  goodness  of  fit  statistics  for  nonlinear  factor 
analyses,  it  would  have  aided  in  the  interpretation. 


Table  4  about  here 


In  summary,  for  real  data  sets,  the  results  are  somewhat  consistent  with  simulated 
data  sets.  For  data  sets  assessed  as  unidimensional  by  DIMTEST  and  H&R,  the  chi-square 
tests  based  on  the  linear  factor  analysis  indicated  a  four— factor  model  for  the  same  data. 
Although  we  do  not  know  the  true  dimensionality  of  real  data,  these  results  suggest  that 
linear  factor  analysis  is  overestimating  the  underlying  dimensionality.  Whereas,  the  other 
three  methodologies  were  excellent  in  identifying  essential  unidimensionality  but  differed  in 
identifying  lack  of  unidimensionality.  DIMTEST  demonstrated  greater  power  than  either 
the  H&R  or  the  nonlinear  factor  analysis  methods.  It  appears  that  with  the  appropriate 
conditional  score  the  power  of  the  H&R  approach  could  be  improved,  and  with  some  type 
of  fit  statistics  and  the  associated  significance  levels,  the  power  of  nonlinear  factor  analysis 
could  be  improved. 
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Discnssion 

Based  on  this  limited  study,  findings  demonstrate  that  the  linear  factor  analysis 
approach  to  assessing  essential  unidimensionality  is  not  satisfactory.  This  finding  is 
consistent  with  the  previous  research  and  theory  (see  for  example,  Hambleton  &  Rovinelli, 
1986;  Hattie,  1984).  In  contrast  to  linear  factor  analysis,  DIMTEST,  H&R,  and  nonlinear 
factor  analysis  were  each  shown  to  be  promising  methodologies  to  assess  dimensionality. 

In  this  stucfy,  all  three  methodologies  exhibited  sensitivity  to  discriminate  between 
one—  and  two-dimensional  test  data.  For  simulated  unidimensional  test  data,  all  three 
procedures  were  able  to  confirm  unidimensionality.  For  the  real  data,  all  three  procedures 
were  consistent  in  identifying  unidimensionality  of  HIST,  AR,  and  GS.  For 
two-dimensional  test  data,  however,  the  three  procedures  differed  in  their  ability  to  detect 
the  lack  of  unidimensionality.  DIMTEST  rejected  the  null  hypothesis  of  essential 
unidimensionality  for  all  two-dimensional  tests;  both  real  and  simulated.  The  H&R 
approach  confirmed  the  lack  of  unidimensionality  for  two-dimensional  simulated  tests, 
provided  the  correlation  between  abilities  was  low  (p=.3).  For  simulated  test  data  with 
high  correlation  between  abilities  (p=-7),  the  H&R  approach  was  unable  to  detect 
multidimensionality.  Also,  for  all  two-dimensional  real  test  data,  the  H&R  approach  was 
unable  to  detect  multidimensionality. 

The  performance  of  the  nonlinear  factor  analysis  methodology  was  similar  to  the 
H&R  procedure  for  two-dimensional  data  sets.  For  simulated  test  data  with  p—.Z,  the 
two— factor  model  with  linear  and  quadratic  terms  demonstrated  adequate  fit  statistics 
(smaller  means  and  standard  deviations  of  squared  residuals  and  absolute  residuals).  For 
simulated  tests  with  p=.7,  however,  the  difference  in  fit  statistics  between  one-factor  and 
two-factor  quadratic  models  was  not  evident.  Similarly  for  two-dimensional  real  test  data 
HSTLIT2  and  ARCS,  the  difference  in  fit  statistics  between  one-factor  and  two-factor 
models  with  linear  and  quadratic  terms  was  not  evident.  The  difficulty  in  deciding  about 
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the  correct  model  arises  because  there  is  no  concrete  way  of  assessing  what  is  meant  by 
"sufficiently  small"  for  goodness  of  fit  statistics. 

In  this  study,  the  results  associated  with  the  H&R  approach  were  consistent  with 
the  findings  of  the  Ben— Simon  and  Cohen’s  (1990)  and  Zwick’s  (1987)  studies.  The  number 
of  significant  negative  partial  associations  for  unidimensional  tests  was  far  below  the 
expected  five  percent  level,  making  it  a  very  conservative  test.  Consequently,  it  did  not 
exhibit  high  power.  The  reason  one  observes  fewer  than  the  nominal  level  of  negative 
partial  associations  is  that  the  conditional  score  used  in  computing  the  covariances  is  not 
perfectly  correlated  with  the  latent  variable  (Zwick,  1987).  According  to  the  theorems 
proved  by  Holland  and  Rosenbaum  (1986),  the  conditional  score  used  to  compute  the 
covariances  can  be  ^ny  function  of  the  latent  trait.  An  appropriate  choice  of  conditional 
score,  therefore,  could  maximize  the  power  of  H&R  approach. 

The  results  of  nonlinear  factor  analyses  were  consistent  with  the  findings  of 
Hambleton  and  Rovinelli  (1986).  Factor  models  with  linear  and  quadratic  terms  were  able 
to  fit  the  data  better  than  models  with  just  linear  terms.  The  problem  with  nonlinear 
factor  analysis  is  determining  the  appropriate  number  of  polynomial  terms  to  retain  in  the 
model.  This  problem  suggests  that  some  type  of  adequacy  of  fit  statistics  with  associated 
sampling  distribution  would  be  necessary  to  aid  in  assessing  the  fit  of  nonlinear  models. 

In  terms  of  assessing  the  degree  of  multidimensionality,  both  the  DIMTEST  and 
nonlinear  factor  analysis  approaches  can  be  useful.  The  T— values  associated  with 
DIMTEST  and  the  fit  statistics  assodaied  with  nonlinear  factor  analysis  can  be  helpful  in 
assessing  the  degree  of  multidimensionality.  For  example,  both  HIST  and  AR  are 
considered  as  essentially  unidimensional  data  sets,  but  the  associated  T— values  are  —1.53 
and  1.18  respectively.  By  contrast,  for  a  two-dimensional  data  set  HSTLIT2,  7’=2.03.  The 
difference  in  the  T— values  mirrors  the  degree  of  multidimensionality  present  in  the  data. 
Similarly,  the  difference  in  fit  statistics  between  one-factor  and  two-factor  quadratic 
models  for  DATAl  and  DATA4  reflects  the  degree  of  multidimensionality. 
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In  the  present  study,  the  test  length  is  more  than  25  items,  and  the  sample  sizes  are 
around  2000  examinees.  It  is  not  known  if  the  results  would  hold  up  for  small  test  lengths 
and  sample  sizes.  De  Champlain  and  Gessaroli  (1991)  have  shown  that  DIMTEST  loses 
power  when  both  the  test  length  and  the  sample  size  are  small  (for  example,  ^=25  and 
J=500).  Their  results  show  support  for  the  use  of  incremental  fit  index  (IFI)  using  the 
nonlinear  factor  analysis  program,  NOHARM  11,  to  assess  dimensionality  in  cases  of 
smaller  test  lengths  and  sample  sizes.  Ben— Simon  and  Cohen  (1990)  have  found  that  the 
test  length  and  the  sample  size  had  a  marked  effect  on  the  M-H  Z-statistic  in  the 
detection  of  multidimensionality.  In  their  study  they  tried  test  lengths  of  20,  30,  40,  and  50 
and  sample  sizes  of  1000,  2000,  3000,  and  4000.  They  found  that  larger  samples  and  larger 
tests  facilitated  the  detection  of  multidimensionality.  They  urge  a  cautious  interpretation 
of  M-H  test  results  in  light  of  test  lengths  and  sample  sizes. 

Just  as  linear  and  nonlinear  methodologies  share  the  same  philosophical  theory, 
DIMTEST  and  H&R  approaches  share  the  same  theoretical  framework.  The  basic  rationale 
for  the  H&R  approach  is  to  reject  the  locally  independent,  monotone,  unidimensional 
model  if  the  conditional  covariances  are  significantly  negative.  By  contrast,  DIMTEST 
rejects  the  essentially  independent,  monotone,  essentially  unidimensional  model  if  the 
conditional  covariances  are  significantly  positive  (it  can  be  shown  that  the  expected  value 
of  the  numerator  of  Stout's  statistic  T  is  mathematically  equivalent  to  average  conditional 
covariances  among  ATI  items.  Stout  (1987)).  This  apparent  contradiction  in  the  criterion 
for  assessing  unidimensionality  may  be  resolved  by  noting  the  subtle  difference  in  item  pair 
covariances  under  consideration.  In  the  H&R  approach,  one  expects  the  conditional 
covariance  between  items  measuring  different  traits  to  be  negative;  whereas  in  Stout’s 
approach,  one  expects  the  asymptotic  conditional  covariance  between  items  measuring  the 
same  trait  to  approach  zero.  DIMTEST  is  specifically  designed  to  assess  unidimensionality 
and  thus  looks  for  the  existence  of  at  least  two  dominant  dimensions.  By  contrast,  the 
H&R  approach  looks  at  all  item  pairs  and  detects  items  that  are  not  measuring  the  same 
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trait  as  other  items  of  the  test. 

As  for  the  computational  time  involved,  DIMTEST  is  most  efficient.  The 
computational  time  involved  for  other  procedures  is  significantly  more.  For  example,  for  a 
25  item  test  with  2000  examinees,  DIMTEST  uses  4  seconds  of  CPU  time,  H&R  approach 
uses  24  seconds,  and  nonlinear  factor  analysis  uses  42  seconds;  for  a  50  items  test  with  2000 
examinees,  DIMTEST  uses  8  seconds,  H&R  approach  uses  106  seconds,  and  nonlinear 
factor  analysis  uses  191  seconds.  As  the  test  length  increases,  the  H&R  approach  requires 
disproportionately  more  time,  and  the  same  is  true  for  the  nonlinear  factor  analysis  as  test 
length  increases  and/or  the  model  gets  more  complex. 
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Notes 


^The  reader  is  reminded  that  testing  for  unidimensionality  is  not  S3mon3rmous  to  testing  for 
model— data  fit.  If  a  unidimensional  model  is  to  be  applied  to  the  data,  testing  for 
unidimensionality  is  the  first  step.  If  item  responses  are  essentially  unidimensional,  then  as 
a  second  step,  one  can  test  for  model-data  fit,  such  as,  one-parameter  logistic, 
two-parameter  logistic,  etc. 
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Table  1 

Description  of  Data  Sets 


Wtimhftr  nf  itftms  of  each  trait 

Name  Traits  iV®  Trait  1  Trait2  Mixed* 

Simulated  data  sets 


DATAl 

2000 

1 

25 

25 

0 

0 

DATA2 

2000 

1 

40 

40 

0 

0 

DATA3 

2000 

1 

50 

50 

0 

0 

DATA4 

2000 

2 

.3 

25 

8 

8 

9 

DATA5 

2000 

2 

.7 

25 

8 

8 

9 

DATA6 

2000 

2 

.3 

50 

16 

16 

17 

DATA? 

2000 

2 

.7 

50 

16 

16 

17 

Real  data  sets 

LIT 

2439 

1 

30 

30 

0 

0 

fflST 

2428 

1 

31 

31 

0 

0 

AR 

1984 

1 

30 

30 

0 

0 

GS 

1990 

1 

25 

25 

0 

0 

HSTLITl 

2428 

2 

— 

36 

31 

5 

0 

HSTLIT2 

2428 

2 

— 

41 

31 

10 

0 

ARCS 

1853 

2 

— 

40 

30 

10 

0 

HSTGEO 

2440 

2 

— 

36 

31 

5 

0 

‘  J  denotes  the  number  of  examinees 
denotes  the  correlation  between  traits 
®N  denotes  the  test  length 

*mixed  items  are  a  combination  of  both  traits  1  and  2 


DIMTEST 


Table  2 

Results  of  DIMTEST  and  H&R  Analyses 


H&R  Test 


H.:  d^l 

H.:  coiiX^Xji  E  X^IQ 

Decision 

No.of 

item 

No.  of 
pairs 

No.of 

pairs 

Decision 
based  on 

based  on 

pairs 

significant 

significant 

Bonferoni 

Name 

T  p<  DIMTEST 

t 

at  level  a 

at  level  aft 

bounds 

Sixanlated  test  data 

4 


DATAl 

-1.05 

.85 

accept  H. 

300 

1 

0 

accept  H. 

DATA2 

-0.75 

.77 

accept 

780 

3 

0 

accept 

DATA3 

-0.94 

.83 

accept 

1225 

10 

0 

accept 

DATA4 

7.19 

.000 

reject 

300 

71 

15 

reject 

DATA5 

3.62 

.000 

reject 

300 

10 

0 

accept 

DATA6 

10.13 

.000 

reject 

1225 

206 

1 

reject 

DATA7 

2.41 

.008 

reject 

1225 

56 

0 

accept 

Real  test  data 

LIT 

1.70 

.045 

accept 

435 

16 

1 

undecided 

fflST 

-1.53 

.937 

accept 

465 

6 

0 

accept 

AR 

1.18 

.118 

accept 

435 

3 

0 

accept 

GS 

-0.14 

.555 

accept 

300 

6 

0 

accept 

HSTLITl 

3.01 

.036 

reject 

630 

17 

0 

accept 

HSTLIT2  2.03 

.021 

reject 

820 

18 

0 

accept 

ARCS 

6.15 

.000 

reject 

780 

4 

0 

accept 

HSTGEO 

6.19 

.000 

reject 

630 

17 

0 

accept 

♦ 

significant  at  .05  level 


Table  3 

Results  of  Linear  and  Nonlinear  Factor  Analysis 
For  Simulated  Test  data:  Goodness  of  Fit  Statistics 


♦ 

SD(r^ 

SD(|r^.|) 

** 

P< 

RANDOM 

Linear  Factor  Analysis 

1  Factor 

.0009 

.0308 

.0250 

.0182 

2  Factor 

.0008 

.0283 

.0225 

.0169 

3  Factor 

.0007 

.0246 

.0207 

.0160 

4  Factor 

.0006 

.0245 

.0196 

.0147 

DATAl 

Linear  Factor  Analysis 

1  Factor 

.0017 

.0412 

.0333 

.0242 

.006 

2  Factor 

.0013 

.0359 

.0286 

.0218 

.350 

3  Factor 

.0011 

.0332 

.0262 

.0204 

.610 

4  Factor 

.0009 

.0303 

.0236 

.0191 

.860 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic 

.0003 

.0185 

.0147 

.0113 

(Yj=  bi(,+bii«+bi/+di«j) 

1  Factor  Cubic 

.0003 

.0185 

.0147 

.0113 

DATA2 

Linear  Factor  Analysis 

1  Factor 

.0110 

.1049 

.0982 

.0369 

.000 

2  Factor 

.0091 

.0954 

.0896 

.0327 

.000 

3  Factor 

.0070 

.0834 

.0774 

.0310 

.000 

4  Factor 

.0061 

.0779 

.0720 

.0278 

.000 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic 

.0003 

.0186 

.0148 

.oiir 

(Yj=  b,„+bii«+bjj«>+djUi) 

1  Factor  Cubic 

.0003 

.0185 

.0148 

.0113 

(Yi=  bi„+biid+bijd>+bj3«>+diUi) 

DATA3 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic 

.0003 

.0186 

.0147 

.0115 

(Yj=  bjj+bijd+bijd’+drt) 

1  Factor  Cubic 

.0003 

.0175 

.0138 

.0108 

(Yj=  bid+biiS+bijS'+bijdVdjUi) 


Table  3  continued... 

DATA4 


Linear  Factor  Analysis 

1  Factor  .0203 

.1425 

.1108 

.0900 

.000 

2  Factor  .0017 

.0412 

.0334 

.0240 

.000 

3  Factor  .0012 

.0346 

.0276 

.0212 

.008 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic  0021 

.0465 

.0523 

.0379 

(Yj=  bio+bii«+bi/+dj«i) 

2  Factor  Quadratic  .0003 

.0171 

.0131 

.0109 

(^i“  ^i0't^ill^l'*‘^il2^l'‘‘'^i21^2‘*‘^i22^2'*’‘^i’^i 
DATA5 

Linear  Factor  Analysis 

1  Factor  .0047 

) 

.0686 

.0556 

.0409 

.000 

2  Factor  .0014 

.0374 

.0313 

.0218 

.011 

3  Factor  .0012 

.0346 

.0289 

.0199 

.245 

4  Factor  .0010 

.0316 

.0254 

.0181 

.600 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic  .0009 

.0307 

.0246 

.0186 

(Yi=  bi„+biiO+bjj«»+diUi) 

2  Factor  Quadratic  .0003 

.0174 

.0138 

.0107 

(^i”^i0‘*’^ll^l‘^^il2^1‘^^i21^2''‘^22^2'*'‘^i’^i) 

DATA6 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic  .0005  .0242 

.0204 

.0172 

(Yi=  bio+bjj«+bj/+djnj) 

2  Factor  Quadratic  0003 

.0182 

.0145 

.0111 

(^i'^  *’i0'^^ill^l'''^il2^l'*'^i21^2'^*^i22^2'*'‘*i“i^ 

DATA7 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic  .0005  .0223 

.0176 

.0137 

(Yi=  bio+bjj«+bijd»+di«i) 

2  Factor  Quadratic  .0003 

.0175 

.0140 

.0105 

(^i“  ^0'‘‘^ill^l'‘’^12^l‘’'^i21^2'*’^i22^2‘*'‘^i’^i 

,) 

r . .  aie  the  residual  correlations 

p-Talue  associated  with  the  chi-squate  test  of  goodness  of  fit. 


Table  4 

Results  of  Lineal  and  Nonlinear  Factor  Analysis 
For  Real  Test  data:  Goodness  of  Fit  Statistics 


Linear  Factor  Analysis 

1  Factor  0034 

2  Factor  .0028 

3  Factor  0019 

4  Factor  .0015 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic  .0008 

(Yj=  bio+bii9+bi/+diUi) 

2  Factor  Quadratic  -0004 

(^i=^i0‘‘'^ill^l‘‘'^12^l'‘'^21^2‘'‘^22^2‘*'‘^i’^i) 


* 

SD(|r^|) 

** 

?:? 

y 

SD(r^ 

P< 

.0034 

.0584 

.0465 

.0354 

.000 

.0028 

.0526 

.0428 

.0307 

.000 

.0019 

.0439 

.0349 

.0267 

.000 

.0015 

.0391 

.0310 

.0240 

.000 

.0008 

.0278 

.0216 

.0176 

.0004 

.0207 

.0162 

.0130 

Linear  Factor  Analysis 

1  Factor  .0047  .0683 

2  Factor  .0032  .0561 

3  Factor  .0024  .0489 

4  Factor  0020  .0447 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic  .0007  .0265 

(Yi=  bi„+bji«+bj/+diUi) 

2  Factor  Quadratic  .0004  .0190 

(Yi=bi„+bjii<»j+bjj2«J+bj2i«2+bi22«2+W 

HSTUTl 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic  .0008  .0275 

(Yj=  bj(|+bij«+bi2«’+(ijUi) 

2  Factor  Quadratic  .0003  .0185 

(^i=  ^0‘*'^ill^l'^^12®l'‘'^i21^2'’'^22^2'‘'^i23^1^2'^^i’^i) 


.0047 

.0032 

.0024 

.0020 

.0683 

.0561 

.0489 

.0447 

.0569 

.0468 

.0400 

.0362 

.0378 

.0310 

.0281 

.0262 

.000 

.000 

.000 

.000 

.0007 

.0265 

.0200 

.0174 

.0004 

.0190 

.0146 

.0122 

.0008 

.0275 

.0213 

.0175 

.0003 

.0185 

.0143 

.0118 

Table  4  continued... 
HSTLIT2 


Nonlinear  Factor  Analysis 


1  Factor  Quadratic 

.0006  .0236 

.0181 

.0152 

(Yi=  bio+bii«+bjj«»+diBi) 

2  Factor  Quadratic 

.0004  .0191 

.0150 

.0119 

(^i“^0'‘‘^ill^l‘‘’^il2^l'*‘^i21^2‘‘'^i22^2‘’‘^i23^1^2‘‘’‘^i’^i^ 

ARCS 

Nonlinear  Factor  Analysis 

1  Factor  Quadratic 

.0021  .0462 

.0268 

.0376 

(Yi=bio+biid+bi5«>+bj3ei) 

2  Factor  Quadratic 

.0004  .0192 

.0003 

.0123 

(^i“^i0‘''^ill^l‘''^12^l‘‘‘^i21^2‘*'^22^2‘^^i23^1^2‘*'‘^i“i^ 

3  Factor  Quadratic  .0004  .0175 

.0003 

.0111 

(^i=^i0+^ll^l+^il2^1+^i21^2+^22^2+^i31^3+ 

^i32^3‘*‘^i33^1^2'^^i34^1^3'‘‘^i35^2^3‘^‘^i®i^ 


r .  ■  are  residual  correlations 

y 

♦♦ 

p-value  associated  with  the  chi-square  test  of  goodness  of  fit. 
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